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Abstract 

When are asymptotic approximations using the delta-method uniformly valid? We 
provide sufficient conditions as well as closely related necessary conditions for uniform neg¬ 
ligibility of the remainder of such approximations. These conditions are easily verified 
and permit to identify settings and parameter regions where pointwise asymptotic approx¬ 
imations perform poorly. Our framework allows for a unified and transparent discussion 
of uniformity issues in various sub-fields of econometrics. Our conditions involve uniform 
bounds on the remainder of a first-order approximation for the function of interest. 
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1 Introduction 


Many econometric procedures are motivated and justified using asymptotic approximations. 
Standard asymptotic theory provides approximations for fixed parameter values, letting the 
sample size go to infinity. Procedures for estimation, testing, or the construction of confidence 
sets are considered justified if they perform well for large sample sizes, for any given parameter 
value. 

Procedures that are justified in this sense might unfortunately still perform poorly for arbi¬ 
trarily large samples. This happens if the asymptotic approximations invoked are not uniformly 
valid. In that case there are parameter values for every sample size such that the approximation 
is poor, even though for every given parameter value the approximation performs well for large 
enough sample sizes. Which parameter values cause poor behavior might depend on sample 
size, so that poor behavior does not show up in standard asymptotics. If a procedure is not 
uniformly valid, this can lead to various problems, including (i) large bias and mean squared 
error for estimators, (ii) undercoverage of confidence sets, and (iii) severe size distortions of 
tests. 

Uniformity concerns are central to a number of sub-fields of econometrics. The econometrics 
literature has mostly focused on uniform size control in testing and the construction of confidence 
sets. Uniform validity of asymptotic approximations is however a more general issue, and is 
important even if we are not interested in uniform size control, but instead have decision theoretic 
or other criteria for the quality of an econometric procedure. Litera tures that have foc u sed o n 
uniformity issues include the literature on weak instruments, eg. IStaiger and StockI (|l997ll . 


the literature on inference under partial identification in general and moment inequalities in 
par ticular, eg. 


eg- 


Imbens and Manski 


Leeb and Potschen ( 2005 b 


(|200J), and the literature on pre-testing and model selection, 


The purpose of this paper is to provide a unified perspective on failures of uniformity. We 
argue that in many settings the poor performance of estimators or lack of uniformity of tests 
and confidence sets arises as a consequence of the lack of uniformity of approximations using 
the “delta method.” Motivated by this observation, we provide sufficient and necessary con¬ 
ditions for uniform negligibility of the remainder of an asymptotic approximation using the 
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delta method. These conditions are easily verified. This allows to spot potential problems with 
standard asymptotics and to identify parts of the parameter space where problems might be 
expected. Our sufficient conditions require that the function (j) of interest is continuously differ¬ 
entiable, and that the remainder of the first order approximation (j){x + Ax) ~ (f’ix) + 4>'{x)Ax 
is uniformly small relative to the leading term (j)'{x)Ax, in a sense to be made precise below. 

In the case of weak instruments, this condition fails in a neighborhood of X 2 = 0 for the 
function (j){x) = xxjxi^ which is applied to the “reduced form” covariances of instrument and 
outcome, and instrument and treatment. In the case of moment inequalities or multiple hypoth¬ 
esis testing, remainders are not negligible in a neighborhood of kinks of the null-region, where 0 
is for inst ance the distance of a stati stic to the null-region. For interval-identified objects as dis¬ 
cussed bv llmbens and Manskil (1200411 . such a kink corresponds to the case of point-identification. 


In the case of minimum-distance estimation, with over-identified parameters, remainders are 
non-negligible when the manifold traced by the model has kinks or high curvature. In the case 
of pre-testing and model selection, this condition fails in a neighborhood of critical values for the 
pretest, where </> is the mapping from sample-moments to the estimated coefficients of interest; 
in the case of Lasso in the neighborhood of kinks of the mapping from sample-moments to the 
estimated coefficients^ 

The rest of this paper is structured as follows. Section [5] provides a brief review of the 
literature. Section [3] reviews definitions and discusses some preliminary results, including a 
result relating uniform convergence in distribution to uniformly valid confidence sets, and a 
uniform version of the continuous mapping theorem. Section 3] presents our central result, the 
sufficient and necessary conditions for uniform validity of the delta method. This section also 
shows that continuous differentiability on a compact domain is sufficient for uniform validity 
of the delta-method. Section [S] discusses several applications to illustrate the usefulness of our 
approach, including a number of stylized examples, weak instruments, moment inequalities, and 
minimum distance estimation. Appendix El contains all proofs. 


^Additional complications in pre-testing, model selection and Lasso settings arise because of drifting critical 
values or penalty parameters, so that they are only partially covered by our basic argument. 
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2 Literature 


Uniformity considerations have a long tradition in the statistical literature, at least since Hodges 
discussed his estimator and Wald analyzed minimax estimation. Uniformity considerations have 
motivated the development of much of modern asy mptotic theory and in pa rticular the notion 


of limiting experiments, as reviewed for instance in 


Le Cam and Yana ( 201211 . 


The interest in uniform asymptotics in econometrics was prompted by the poor finite-sample 
performance of some commonly used statistical procedures. Important examples include the 
study of ‘local-to-z ero’ behavior of estimators, tests and confidence sets in linea r instrumental 


variable regression (jStaiger and Stock 


1997 


Moreira 


unit root’ analysis for autoregressive parameters (see 


2003 


Andrews et al 


Stock and Watson 


1996 


2006); the ‘local-to- 


Mikusheva 


20071) : 


and the behavi or of estimators, tests and confidence sets tha t follow a pre-testing or a model 


selection stage (ILeeb and Potscher 


2005 


Guggenberger 


2010a). 


Much of this literature has been concerned with finding statistical procedures that control 
size uniformly in large samples over a reasonable parameter space. Our paper has a different 
objective. We argue that in most of these problems there are reduced-form statistics satisfying 
uniform central limit theorems (CLT) and uniform laws of large numbers (LLN)!^ The failure 
of some commonly used tests and confidence sets to be uniformly valid, despite the uniform 
convergence of reduced-form statistics, is a consequence of the lack of uniformity of the delta 
method. 

There are some discussions of uniformity and the delta method in the literature. 


van der Vaart 


(|2000r) . for instance, discusses uniform validity of the delta-method in section 3.4. His result 


requires continuous differentiability of the function of interest (j) on an open set, and conver¬ 
gence of the sequence of parameters On to a point 0 inside this open set. This result does not 
allow to study behavior near bounda ry points of the dom ain, which will be of key interest to 


us, as discussed below. The result in 


van der VaartI (|2000l) section 3.4 is an implication of our 


mo re general theorems below. Uniformity of the delta method has also been studied recently 


by 


Belloni et al 


(|2013^ with a focus on infinite dimensional parameters. They provide a suf¬ 


ficient condition (a slight modification of the notion of uniform Hadamard differentiability in 


^Uniform CLT and uniform LLN in this paper are used to describe results that guarantee uniform convergence 
in distribution or probability of random vectors, rather than results that guarantee convergence of empirical 
processes. 
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van der Vaart and WellneI^ (jl996h . p. 379) that guarantees the uniform validity of delta method 


approximations. In contrast to their condition, (i) we do not require the parameter space to be 
compaciy, and (ii) we provide necessary as well as sufficient conditions for uniform convergence. 


The analysis in 


Andrews and Mikiisheva 


(l2014h is closely related to ours in spirit. They consider 


tests of moment equality restrictions when the null-space is curved. First-order (delta-method) 
approximations to their test statistic are poor if the curvature is large. 

The uniform delta-met hod established in this paper does not allow the function to 
depend on the sample size. IPhillipsI (120121) has extended the pointwise delta-method in such a 


direction using an asymptotically locally relative equicontinuity condition. 

Not all uniformity considerations fall into the framework discussed in our paper. This is in 
particular true for local to unit root analysis in autoregressive models. Model selection, and 
the Lasso, face problems closely related to those we discuss. Additional complications arise in 
these settings, however, because of drifting critical values or penalty parameters, which lead to 
“oracle-property” type approximations that are not uniformly valid. 

3 Preliminaries 


In this section we introduce notation, define notions of convergence, and state some basic results 
(which appear to be known, but are scattered through the literature). Throughout this paper, we 
consider random variables defined on the fixed sample space II, which is equipped with a sigma- 
algebra and a family of probability measures Pg on (II, ^) indexed by d G 0. S, T, X, Y, Z 
and Sn,Tn, Xn are random variables or random vectors defined on II. We are interested in 
asymptotic approximations with respect to n. /r = ^(d) denotes some finite-dimensional function 
of 9, and F is used to denote cumulative distribution functions. The derivative of (p{m) with 
respect to m is denoted by D{m) = d(l)/dm = dm4>- 

The goal of this paper is to provide conditions that guarantee uniform convergence in distribu¬ 
tion. There are several equivalent ways to define convergence in distrib ution. One definition re¬ 
quires convergence in terms of the so called bounded Lipschitz metric, cf. 


van der Vaart and Wellner 


(199^, p73. This definition is useful for our purposes, since it allows for a straightforward ex- 


^Compactness excludes settings where problems arise near boundary points, such as weak instruments. 
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tension to uniform convergence. 


Definition 1 (Bounded Lipschitz metric) 

Let BLi be the set of all real-valued functions h onW^’^ such that |ft-(a;)| < 1 and \ h{x) — h{x')\ < 
\\x — x'W for all X, x'. 

The bounded Lipschitz metric on the set of random vectors with support in is defined by 

4i(Xi,X2):= sup \E<^[h{X,)]-E^[h{X 2 )]\. (1) 

In this definition, the distance of two random variables Xi and X 2 depends on 0, which indexes 
the distribution of Xi and X 2 , and thus also the expectation of functions of these random 
variables. Standard asymptotics is about convergence for any given 9. Uniformity requires 
convergence for any sequence of as in the following definition. 

Definition 2 (Uniform convergence) 

Let 9 € 0 be a (possibly infinite dimensional) parameter indexing the distribution of both X^ 
and Yn- 

1. We say that Xn converges uniformly in distribution to Yn if 

d%-^{Xn,Yn) ^ 0 ( 2 ) 


for all sequences {9n G 0}. 


2. We say that Xn converges uniformly in probability to Yn if 


P^-(\\Xn-Yn\\>e)^Q 


(3) 


for all e > 0 and all sequences S 0}. 


Remarks: 


• As shown in ( van der Vaart and Wellner 


19961 section 1.12), convergence in distribution 


of Xn to X, defined in the conventional way as convergence of cumulative distribution 
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functions (CDFs) at points of continuity of the limiting CDF, is equivalent to convergence 
of4i(X„,X) toO. 

• Our definition of uniform convergence might seem slightly unusual. We dehne it as con¬ 
vergence of a sequence Xn toward another sequence Yn- In the special case where Yn = X 
so that Yn does not depend on n, this definition reduces to the more conventional one, so 
our definition is more general. 

• There are several equivalent ways to define uniform convergence, whether in distribution 
or in probability. Definition [5] requires convergence along all sequences The following 
Lemma [3 which is easy to prove, shows that this is equivalent to requiring convergence 
of suprema over all 9. 

Lemma 1 (Characterization of uniform convergence) 

1. Xn converges uniformly in distribution to Yn if and only if 

supd%^{Xn,Yn) ^ 0 (4) 

see 

2. Xn converges uniformly in probability to Yn if and only if 

supP®(||X„-K|| >e)^0 (5) 

See 


for all e > 0. 

The proof of this lemma and of all following results can be found in appendix 

Uniform convergence safeguards, in large samples, against asymptotic approximations per¬ 
forming poorly for some values of 9. Absent uniform convergence, there are for arbitrarily 
large n for which the approximation is far from the truth. Guaranteeing uniformity is relevant, 
in particular, to guarantee the validity of inference procedures. The following result shows how 
uniform convergence of a test-statistic to a pivotal distribution allows to construct confidence 
sets with appropriate coverage. This result could equivalently be stated in terms of hypothesis 
testing. 
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Lemma 2 (Uniformly valid confidence sets) 

Suppose that converges uniformly in distribution to Z, where Z^ = ZnifJ-)- Suppose further 
that Z is continuously distributed and pivotal, that is, the distribution of Z does not depend on 
9. Let z be the 1 — a quantile of the distribution of Z. Then 

Cn ■■= {m : Zn{m) < zj (6) 

is such that 

(7) 


for any sequence 9n € Q. 

Lemma [5] establishes the connection between our definition of uniform convergence in distri¬ 
bution and uniformly valid inference. The latter hinges on convergence of 

- ^Z{z) 

to 0 for a given critical value z, whereas uniform convergence in distribution of Z^ to Z can be 
shown to be equivalent to convergence of 


sup 


sup sup 
9eB z 


FLim)(.z) - Fz{z) 


to 0. If we were to require uniform validity of inference for arbitrary critical values, this would 
in fact be equivalent to uniform convergence in distribution of the test-statistic. We should 
emphasize again, however, that uniform convergence in distribution is a concern even if we are 
not interested in uniform size control, but for instance in the risk of an estimator. 


From our definition of uniform convergence, it is straightforward to show the following uni¬ 
form version of the continuous mapping theorem. The standard continuous mapping theorem 
states that convergence in distribution (probability) of to X implies convergence in distri¬ 
bution (probability) of ip(Xn) to continuous function ip. Our uniform version of 

this result needs to impose the slightly stronger requirement that ip be uniformly continuous 






(for uniform convergence in probability) or Lipschitz-continuous (for uniform convergence in 
distribution). 

Theorem 1 (Uniform continuous mapping theorem) 

Let ip{x) be a function of x taking values in K*. 

1. Suppose Xn converges uniformly in distribution to 

If if is Lipsehitz-continuous, then ip{Xn) converges uniformly in distribution to if(Yn). 

2. Suppose Xn eonverges uniformly in probability to 

If if is uniformly continuous, then if{Xn) converges uniformly in probability to ifiYn). 

Remarks: 

• Continuity of if would not be enough for either statement to hold. To see this, consider 

the following example: assume 0 G M"*", = 9, and Yn = Xn + 1/n. Then clearly 

converges uniformly (as a sequence, in probability, and in distribution) to Xn- Let now 
if{x) = 1/x, and = 1/n. if is a. continuous function on the support of and Yn- Then 
if{Xn) = 1, if(Yn) = 1/2, and P®"(|'0(^n) - if(Yn) \ = 1/2) = 1, and thus if{Yn) does not 
converge uniformly (in probability, or in distribution) to if{Xn)- 

• There is, however, an important special case for which continuity of if is enough: If 
Yn = Y and the distribution of Y does not depend on 6, so that T is a pivot, then 
convergence of Xn to Y uniformly in distribution implies that if{Xn) converges uniformly 
in distribution to if{Y) for any continuous function if. This follows immediately from the 
standard continuous mapping theorem, applied along arbitrary sequences {On}- 

4 The uniform delta-method 

We will now discuss the main result of this paper. In the following theorem [51 we consider a 
sequence of random variables (or random vectors) T„ such that 

Sn ■= rn{Tn - n) S 
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uniformly. We are interested in the distribution of some function (j) oi Tn- Let 


D{m) 


^(m), and 


E{m) = dia.g{\\Dk{m)\\) \ 


where ||I?fe(m)|| is the norm of the fcth row of D{m). Consider the normalized sequence of 
random variables 

X„ = r„L;(M)WT„)-<^(/r)). (8) 

We aim to approximate the distribution of Xn by the distribution of 


X := E{n)D{^i) ■ S 


(9) 


Recall that the distributions of Tn and Xn are functions of both 9 and n. ^ and the distribution 
of S are functions of 9 (cf. definition (2) . The sequence is not allowed to depend on 0; the 
leading case is = y/n. If Tn is a vector of dimension dt and Xn is of dimension dx, then 
D = ^ is a dxX- dt matrix of partial derivatives. E{fi) = diag(||Z?fc(^)||)“^ is a da: x dx diagonal 
matrix which serves to normalize the rows of D{fj,). 


Our main result, theorem[51 requires that the “reduced form” statistics Sn converge uniformly 
in distribution to a tight family of continuously distributed random variables S. Uniform con¬ 
vergence of the reduced form can be established for instance using central limit theorems for 
triangular arrays, cf. lemma [3] below. 

Assumption 1 (Uniform convergence of Sn) 

Let Sn ■■= rn{Tn - m )- 

1. Sn ^ S uniformly in distribution for a sequence r„ —>■ oo which does not depend on 9. 

2. S is continuously distributed for all 9 € Q. 

3. The collection tight, that is, for all e > 0 there exists an M < oo such that 

P{\\S\\<M)>l-efor all 9. 
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The leading example of such a limiting distribution is the normal distribution S ^ N{0, S(6*)). 
This satisfies the condition of tightness if there is a uniform upper bound on ||I](6*)||. 

The sufficient condition for uniform convergence in theorem [2] below furthermore requires a 
uniform bound on the remainder of a first order approximation to (j). Denote the normalized 
remainder of a first order Taylor expansion of (/) around m by 

A(t,TO) = 11 ^ I, \\E{m) ■ - D{m) ■ {t - m))\\ . (10) 

The function (j) is differentiable - as required by the pointwise delta-method - if and only if 
A{t, m) goes to 0 as t —)> m for fixed to. 

Note the role of E{m) in the definition of A. Normalization by E{m) ensures that we are 
considering the magnitude of the remainder relative to the leading term D{m) ■ {t — to). This 
allows us to consider settings with unbounded derivatives D{m). 

The first part of theorem [2] states a sufficient condition for uniform convergence of to X. 
This condition is a form of “uniform differentiability;” it requires that the remainder A{t,m) 
of a first order approximation to (/) becomes uniformly small relative to the leading term as 
IIt — to|| becomes small. This condition fails to hold in all motivating examples mentioned at 
the beginning of this paper: weak instruments, moment inequalities, model selection, and the 
Lasso. 

The second part of theorem [5] states a condition which implies that does not converge 
uniformly to X in distribution. This condition requires the existence of a to„ G m( 0) such that 
the remainder of a first order approximation becomes large relative to the leading term. 

Theorem 2 (Uniform delta method) 

Suppose assumptionUl holds. Let (j) be a function which is continuously differentiable everywhere 
in an open set containing /J.(0), the set of p, corresponding to the parameter space oQ Assume 
that D{fi) has full row rank for all p G /r(0). 

“^Note that © and /i.(0) might be open and/or unbounded. 


11 




1. Suppose that 


A{t,m) < S{\\t — m\\). (11) 

for some function S where linie-^o (5(e) = 0, and for all m G /^(0). 

Then converges uniformly in distribution to X. 

2. Suppose there exists an open set A C such that inf^g^ ||s|| = s > 0 and Pe{S G A) > 

p > 0 for all 9 G Q, and a sequence {e'„,mn), > 0 and rUn G m(0), such that 

A(to„ + s/r„,m„) > e(j Vs € >1, (12) 

< ^ oo. 

Then Xn does not converge uniformly in distribution to X. 

In any given application, we can check uniform validity of the delta method by verifying 
whether the sufficient condition in part 1 of this theorem holds. If it does not, it can in general 
be expected that uniform convergence in distribution will fail. Part 2 of the theorem allows to 
show this directly, by finding a sequence of parameter values such that the remainder of a first- 
order approximation dominates the leading term. There is also an intermediate case, where the 
remainder is of the same order as the leading term along some sequence , so that the condition 
in part 2 holds for some constant rather than diverging sequence e(j. This intermediate case 
is covered by neither part of theorem O In this intermediate case, uniform convergence in 
distribution would be an unlikely coincidence. Non-convergence for such intermediate cases is 
best shown on a case-by-case basis. Section 15.11 discusses several simple examples of functions 
(fit) for which we demonstrate that the uniform validity of the delta method fails: 1/t, |f|, 

and y/i. 

The following theorem[31 which is a consequence of theorem[5J shows that a compact domain 
of (f is sufficient for condition m to hold. While compactness is too restrictive for most settings 
of interest, this result indicates where we might expect uniformity to be violated: either in the 
neighborhood of boundary points of the domain /i(0) of (f, if this domain is not closed, or 
as m —> oo. The applications discussed below are all in the former category, and so are the 
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functions 1/t, |f|,and Vt. Define the set for any given set as 


= {t: \\t- fi\\ <e, fiG ^}. 


( 13 ) 


Theorem 3 (Sufficient condition) 

Suppose that /r(0) is compact and (j) is everywhere continuously differentiable on p{Qy for 


some e > 0, and that D[ff) has full row rank for all p, G Then condition dill) holds. 


Theorem [2] requires that the “reduced form” statistics Sn converge uniformly in distribution 
to a tight family of continuously distributed random variables S. One way to establish uniform 
convergence of reduced form statistics is via central limit theorems for triangular arrays, as in 
the following lemma, which immediately follows from Lyapunov’s central limit theorem. 

Lemma 3 (Uniform central limit theorem) 

LetYi be i.i.d. random variables with mean n{9) and variance S(0). Assume that E < 

M for a constant M independent of 0. Then 



converges uniformly in distribution to the tight family of continuously distributed random vari¬ 
ables S ^ N{0,T,{9)). 

In lemma [5] above we established that uniform convergence to a continuous pivotal dis¬ 
tribution allows to construct uniformly valid hypothesis tests and confidence sets. Theorem [5] 
guarantees uniform convergence in distribution; some additional conditions are required to allow 
for construction of a statistic which uniformly converges to a pivotal distribution. The following 
proposition provides an example. 

Proposition 1 (Convergence to a pivotal statistic) 

Suppose that the assumptions of theorem\^ and condition dill) hold. Assume additionally that 
1. ^ is Lipschitz continuous and its determinant is bounded away from 0, 
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2. S ^ -/V(0, S), where S might depend on 9 and its determinant is hounded away from 0, 


3. and that H is a uniformly consistent estimator for S, in the sense that ||I] — E|| converges 
to 0 in probability along any sequence On- 

Let 

^ -Xn. (14) 

Then 

Zn^N{Q,I) 


uniformly in distribution. 

5 Applications 

5.1 Stylized examples 

The following examples illustrate various ways in which functions 4>{t) might violate the neces¬ 
sary and sufficient condition in theorem [51 and the sufficient condition of theorem [3] (continuous 
differentiability on an e-enlargement of a compact domain). In all of the examples we consider, 
problems arise near the boundary point 0 of the domain /r(0) = R \ {0} of All of these 
functions might reasonably arise in various statistical contexts. 

• The first example, 1/t, is a stylized version of the problems arising in inference using weak 
instruments. This function diverges at 0, and problems emerge in a neighborhood of this 
point. 

• The second example, is seemingly very well behaved, and in particular continuously 
differentiable on all of R. In a neighborhood of 0, however, the leading term of a first-order 
expansion becomes small relative to the remainder term. 

• The third example, |t|, is a stylized version of the problems arising in inference based on 
moment inequalities. This function is continuous everywhere on R. It is not, however, 
differentiable at 0, and problems emerge in a neighborhood of this point. 

®Recall that we consider functions 0 which are continuously differentiable on the domain /r(0); 0 might not 
be differentiable or not even well defined at boundary points of this domain. 
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• The last example, on K.’*', illustrates an intermediate case between example 1 and 
example 3. This function is continuous on its domain and differentiable everywhere except 
at 0; in contrast to |t| it does not posses a directional derivative at 0. 

In each of these examples problems emerge at a boundary point of the domain of continuous 
differentiability; such problems could not arise if the domain of (j) were closed. The neighborhood 
of such boundary points is often of practical relevance. Problems could in principle also emerge 
for very large m; such problems could not arise if the domain of (j) were bounded. Very large 
m might however be considered to have lower “prior likelihood” in many settings. For each of 
these examples we provide analytical expressions for A, a discussion in the context of theorem 
m as well as visual representations of A. 


5.1.1 (;i(t) = l/t 


For the function (pit) 


1/t we get Dim) = 


A(t, to) 


\t — to| 


\t — to| 


dm4’i'<^) = Eim) = to^, and 

1 1 t — m 

7-'- - 

t TO TO^ 

m ■ im — t) + t ■ it — to) 

TO^ • t 


t — m 
t 


Figure [T] shows a plot of the remainder A; we will have similar plots for the following examples. 
We get A > e' if and only if |t — TO|/|t| > e'. This holds if either 


t < 
t > 


TO 

TT7’ 

TO 

1-e' 


or 

and e' < 1. 


We can show failure of the delta method to be uniformly valid for pit) = 1/t, using the 
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second part of theorem [2l In the notation of this theorem, let 


r 

e 

m 


n 

/ 

n 

n 


\/n 

'Jn 

1 1 

\/n n 


A=]-2,-l[. 


It is easy to check that the condition of theorem [51 part 2 applies for these choices. 
5.1.2 (j){t)=t^ 

For the function we get D{m) = = 2m, E{m) = l/(2m), and 


A(t, m) 


1 

2m • |t — m| 
1 

2m • |t — m| 


— 2m • {t 
{t — m)^| 


m) 


t — m 
2m 

We therefore get A > e' if and only if |t — m|/|2m| > e'. This holds if either 


t < m • (1 — 2e'), or 
t > m • (1 + 2e'). 


We 

second 


can again show failure of the delta method to be uniformly valid for , using the 

part of theorem |2l In the notation of this theorem, let 



I 

m„ = - 
n 


A=]l,2[. 
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It is easy to check that the condition of theorem [21 part 2 applies for these choices. 


5.1.3 cj){t) = |t| 

For the function = |t|, we get D{m) = = sign(TO), E{m) = 1 and thus the 

normalized remainder of the first-order Taylor expansion is given by 


A(t, m) = 


\t-‘ 


— \m\ — sign(m) • (t — m)| 


= l(t • m < 0) 


2 - 1^1 
|t — m| 


To see that the sufficient condition of the first part of theorem [3] does not hold for this 
example, consider the sequence 


TO„ = 1/n 
tn = -1/n 
A(t„,TO„) = 1. 

In this example, however, the remainder does not diverge; the condition for non-convergence 
of the second part of theorem [3] does not apply either. To see this, note that 

A(t, m) < 2 

for all t, m in this case. We are thus in the intermediate case, where remainder and leading term 
remain of the same order as m approaches 0, and have to show non-convergence “by hand.” To do 
so, suppose Sn converges uniformly in distribution to S' ^ 1), and = ^/n. It immediately 

follows that X ^ N{Q, 1) for all 9. Consider a sequence On such that = 1/n. For 

this sequence we get 


F|."(0) = l/2, 
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which immediately implies that we cannot have Xn X along this sequence. 


5.1.4 (j,{t) = y/t 

For the function (j){t) = y/i, considered to be a function on we get Dim) = = 

E{m) = 2 • and 


A(t, m) = 


2- 


\{y/i - y/m){y/t + y/m)\ 
2 • -v/m 


\/t — y/m - 


t — m 


2y/m 


y/t + y/m 
y/m — y/i 


- 1 


+ y/i 


This implies A > e' if and only if \y/Tn — y/i\ > e'{y/rn + y/i). This holds if either 


y/i < y/m ■ 
y/i > y/m ■ 


1 - e' 
1 + e' 
1 + e' 
1 - e' 


or 

and e' < 1. 


Again, the sufficient condition of the first part of theorem [3] does not hold for this example. 
To show this, consider the sequence 


= 4/n^ 
tn = l/r/ 

A(f„,m„) = 1/3. 

Again, as well, the remainder does not diverge; the condition for non-convergence of the second 
part of theorem [3] does not apply either. To see this, note that 

A(t, m) < 1 


for all t,m > 0. 

To show non-convergence “by hand,” suppose 5'„ converges uniformly in distribution to 
S ^ Xi^ and r„ = ^/n. It immediately follows that X ^ Xi for all 9. Consider a sequence 9n 
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such that run = ^i{On) = lln. For this sequence we get 

N{Q, 1) 

Y Gn ^‘2. 

^ Xl 7 

which immediately implies that we cannot have X along this sequence. 


5.2 Weak instruments 

Suppose we observe an i.i.d. sample {Yi, Di, Zi), where we assume for simplicity that E[Z] = 0 
and Var(Z) = 1. Consider the linear IV estimator 


EnjZY] 

E4ZDY 


(15) 


To map this setting into the general framework discussed in this paper, let 


T^ = E4{ZY,ZD)] 

^l{e) = E[{ZY, ZD)] 

Y{0) =XaT{{ZY,DY)) and 

Y{t) = X (16) 

t2 


In this notati on, 0 = This is a version of the “weak instrument” setting originally 


considered by 


Staiger and StockI (|1997h . Application of lemma |3] to the statistic yields 


VH-(T„-Ai(0)) ^ V(O,E(0)), 

as long as E\\\{ZY, DY)\Y^Y is uniformly bounded for some e > 0. 


Theorem [5] thus applies. Taking derivatives we get D(rn) = (l/m 2 , —m^/m|), and the 
inverse norm of D{m) is given by Eim) = ||ZI(r7T,)||“^ = TO|/||m||. Some algebra, which can be 
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1 


Figure 3: A{t,m) for (j){t) = |f|. 
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found in appendix [Aj yields 


A{t,m) = 


||m|| • ||i-TO|| 


t2 — m 2 


t2 


\m 2 ■ {ti - mi) - mi ■ (^2 - m 2 )| 


(17) 


Generalizing from the example </> = 1/t, consider the sequence 


rn = Vn 
e'„ = Vn 

For any t € mn + we get 

3 

A(t,m„) > V^- 

which implies that the condition of theorem [21 part 2 is fulfilled, and the delta method is not 
uniformly valid in this setting. 


5.3 Moment inequalities 

Suppose we observe an i.i.d. sample (Vn, ^ 12 ), where we assume for simplicity that Var(F) = 7, 
and E\Y] ^ 0. We are interested in testing the hypothesis E\Y] > 0^ A lea ding example of such 


a test ing problem is inference und er partial identification a s discussed in 


2004 

). We follow the approach of 

Hahn and Bidder 

(2014) 


Imbens and Manski 


pothesis testing problem, and using a likelihood ratio test statistic based on the normal limit 
experiment. We demonstrate that the “naive” approximation to the distribution of this statistic, 
a 0.5 • xf distribution is not uniformly valid, even though it is pointwise valid. 

The log generalized likelihood ratio test statistic (equivalently in this setting, the Wald test 
statistic) takes the form 


n ■ min \\t' — T, 

t'eR2+ 




where Tn is the sample average of Yi. 


“Throughout this section, inequalities for vectors are taken to hold for all components. 
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Let 


Tn = E4Y] 
m = E[Y] 

= ( argmin \\t' — t||^ ) — t, and 

V t'eR2+ / 

Mt)= min \\t^-tr = \\Mt)r- ( 18 ) 

t'eR2+ 

The parameter space 0 we consider is the space of all distributions of Y such that E\Yi\ = 0 or 
E[Y 2 \ = 0, but E[Y] + 0. 

“Conventional” testing of the null hypothesis E\Y] > 0 is based on critical values for the 
distribution of n ■ 4 > 2 {Tn), which are derived by applying (i) a version of the delta method to 
obtain the asymptotic distribution of (^i(T„), and (ii) the continuous mapping theorem to obtain 
the distribution of 4 > 2 {Tn) from the distribution of 

We could show that the delta method does not yield uniformly valid approximations on 0 
for (j)i{Tn), using the condition of theorem [51 Standard approximations in this setting, however, 
do not exactly fit into the framework of the delta method, since they do account for the fact 
that one-sided testing creates a kink in the mapping cj)i at points where = 0 or ^2 = 0. As 
a consequence, it is easier to explicitly calculate the remainder of the standard approximation 
and verify directly that this remainder does not vanish uniformly, so that uniform convergence 
does not hold for the implied approximations of 4>i{Tn) and 4>2iTn). We can rewrite 

(/>i(t) = (max(—ti, 0), max(—^ 2 ,0)) 

<^2(t) = tl ■ l(ti <Q) + tl- l(t2 < 0). 


Consider without loss of generality the case m 2 = E\Y 2 ] > 0 and mi = E\Yi\ = 0. For this 
case, the “conventional” asymptotic approximatiorQ sets 

Ki ^i(t) := (max(-ti,0),0). 

^We can interpret this approximation as a first order approximation based on directional derivatives. 
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Based on this approximation, we obtain the pointwise asymptotic distribution of n ■ 4>2{Tn) as 
0.5xi + 0.5(5o- The remainders of these approximations are independent of the approximations 
themselves, and are equal to 


Mt) - Mt) = ( 0 ,max(-t 2 , 0 )) 


4>2{t) - hit) = tj ■ 1(^2 < 0). 


These remainders, appropriately rescaled, do converge to 0 in probability pointwise on 0, since 
y/niT2 — m2) —>■ N{ 0 , !)• This convergence is not uniform, however. Consider a sequence of 
such that TO„i = 0 and to „2 = 1/n. For such a sequence we get 

MTn) - MTn) (0,max(Z,0)) 

MTn) - MTn) Z^-liZ<0). 


where 

Z^Nil,!). 

5.4 Minimum distance estimation 

Suppose that we have (i) an estimator T„ of various reduced-form moments and that (ii) 
we also have a structural model which makes predictions about these reduced form moments 
^(d). If the true structural parameters are equal to /3, then the reduced form moments are 
equal to m(/3). Such structural models are often estimated using minimum distance estimation. 
Minimum distance finds the estimate of /3 such that to(X„) gets as close as possible to the 
estimated moments r„. 

If the model is just-identified, we have dim(t) = dim(a;) and the mapping m is invertible. In 
that case we can set 

Xn = HTn) = TO“^(T'„), 

and our general discussion applies immediately. 

If the model is over-identified, there are more reduced form moments than structural param¬ 
eters. For simplicity and specificity, assume that there are two reduced form moments but only 
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one structural parameter, dim(t) = 2 > dim(a;) = 1. Suppose that T„ converges uniformly in 
distribution to a normal limit, 

v^-(r„-^(0)) A^(o,s(0)). 

Let Xn be the (unweighted) minimum distance estimator of 13, that is 

Xn = = argmin e(a;,r„). 


where 

e(x,T„) = ||r„ - m{x)\f. 

A delta-method approximation of the distribution of Xn requires the slope of the mapping (j). 
We get this slope by applying the implicit mapping theorem to the first-order condition dxC = 0. 
This yields 


d^e = —2 ■ dxm ■ (t — m(x)) 
dxxG = -2 • {dxxm ■ (t-m) - \\dxm\\^) 
dxt^ ~ 2 ■ OxITi 

dtfjiit) = -{dxxe)~^ ■ dxte 

= -{dxxm ■ (t-m) - \\dxm\\^)~^ ■ dxm. 


If the model is correctly specified, then there exists a parameter value x such that m{x) = 
fj,{0). Evaluating the derivative dt(j){t) at t = m{x) yields 


dt(j){t) = 




dxm. 


Let m = m{x), so that <j){m) = x. The normalized remainder of a first order approximation to 
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(j) at such a point is given by 


A(t, m) 


1 


\\dm(t){rn)\\ ■ ||t- 

\\dxm\? 


||a^m|| • ||t- m|| 


—^ - (j){m) - drni’im) ■ {t - m)\ 

m\\ 

\4i{t) — X — |19a:m||“^ • dxTn ■ {t — m)\ 


(19) 


The magnitude of this remainder depends on the curvature of the manifold traced by rn{.), 
as well as on the parametrization which maps x to this manifold. The remainder will be non- 
negligible to the extent that either the manifold or the parametrization deviate from linearity. 
If the manifold has kinks, that is points of non-differentiability, then our discussion of moment 
inequalities immediately applies. If the manifold is smooth but has a high curvature, then 
the delta-method will provide poor approximations in finite samples, as well. As a practical 
approach, we suggest to numerically evaluate A for a range of plausible values of m and t. 


6 Conclusion 

Questions regarding the uniform validity of statistical procedures figure prominently in the 
econometrics literature in recent years: Many conventional procedures perform poorly for some 
parameter configurations, for any given sample size, despite being asymptotically valid for all 
parameter values. We argue that a central reason for such lack of uniform validity of asymptotic 
approximations rests in failures of the delta-method to be uniformly valid. 

In this paper, we provide a condition which is both necessary and sufficient for uniform 
negligibility of the remainder of delta-method type approximations. This condition involves a 
uniform bound on the behavior of the remainder of a Taylor approximation. We demonstrate in 
a number of examples that this condition is fairly straightforward to check, either analytically 
or numerically. The stylized examples we consider, and for which our necessary condition fails 
to hold, include l/t, |t|, and ^/t. In each of these cases problems arise in a neighborhood 
of t = 0. Problems can also arise for large t. We finally discuss three more realistic examples, 
weak instruments, moment inequalities, and minimum distance estimation. 
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A Proofs 


Proof of lemma [U 

1. To see that convergence along any sequence On follows from this condition, note that 

e 

To see that convergence along any sequence implies this condition, note that 

swp d%i^{XmYn) 74 0 
e 

implies that there exist e > 0, and sequences Om, nm —>■ 00, such that 

, Yn^ ) > e 

for all m. 

2. Similarly 

SUpP®(||X„ - YnW >e)> P^-{\\Xn - YnW > c) 

9 

shows sufficiency, and 

supP®(||X„-r„|| >e)74 0 

e 

implies that there exist > 0 and sequences 6m j Um^ such that 

P<^^{\\Xn^-YnJ\>e)>e' 


for all m. 

□ 


Proof of lemma [2} Fix an arbitrary sequence On- Uniform convergence in distribution of 
Zn to Z implies convergence in distribution of Zn to Z along this sequence. By Portmanteau’s 
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lemma (van der Vaart 


200ri p6), uniform convergence in distribution of Zn to Z is equivalent to 


convergence of to Fz{z) at all continuity points S' of Fz{.). Since we assume the latter 

to be continuous, convergence holds at all points S’, and thus in particular at the critical value 
z. The claim follows. □ 


Proof of theorem [1} 

Let 1 < K < 00 be such that — ^{y) \ < «• 11^^ ~ vW for all x,y. 

1. Note that h € BLi implies h' := ^ ■ h o ip £ BLi. 

\h'{x) - h'{y)\ = n-^\h{'iP{x)) - h(^/>(y))| 

(by definition of h') 

< K~'^\\il}{x) - il}{y)\\ 

(since h G BLi) 

< k~^k\\x — y\\ 

(since tp is Lipschitz-continuous with parameter /c). 


and \h'{x)\ < 1 for k > 1. 

By definition of the Lipschitz metric 

d%z{HXu),HYu)) = sup \E^[h{:P{Xn))]-E%h{P:{Yn))]\ 

<«• sup \E\h'{Xr,)]-E%h'{Yr,)]\ 

= K ■ dg^(Sf„, Yn). 

(fgziXn, Yn) —?> 0 for all sequences {On G 0} therefore implies fi^^('0(X„), 'ip{Yn)) —t 0 for 
all such sequences. 

2. For a given e > 0, let d > 0 be such that ||a; — y|| < <5 implies ||^/’(a;) — tp{y)\\ < e for all 
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X, y. Such a <5 exists by uniform continuity of ip. For this choice of <5, we get 


P^mXn) - lP(Yn)\\ >e)< P^iWXn - KnU > 5). 

By uniform convergence in probability, — Fra|| > (5) —>■ 0 for all 5 > 0 and all 

sequences £ 0}, which implies P^’^{\\ip{Xn) — V'(b^n)|| > e) —>■ 0 for all such sequences. 


Proof of theorem 

Define 


Xn = E{fi)D{fi) ■ Sn and 
Rn — Xn Xn. 

The proof is structured as follows. We show first that under our assumptions Xn converges 
uniformly in distribution to X = E(fj,)D{p) ■ S. This is a consequence of uniform convergence 
in distribution of to S and the boundedness of E{y)D{y,). 

We then show that Rn converges to 0 in probability uniformly under the sufficient condition 
m- This, in combination with the first result, implies uniform convergence in distribution of 
Xn to X, by Slutsky’s theorem applied along arbitrary sequences 0„. 

We finally show that Rn diverges along some sequence under condition (ED. This implies 
that Xn = Xn + Rn cannot converge in distribution along this sequence. 

Uniform convergence in distribution of Xn to X: 

Note that 

{Xn,X)<d, 

' ^BL {Sn,S). 

This holds since multiplication by E{fi)D{fj,) is a Lipschitz continuous function with Lipschitz 
constant dx. To see this note that each of the dx rows of E{ii)D{fj,) has norm 1 by construction. 
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Since d%^{Sn, S') —>■ 0 by assumption, the same holds for dg^{Xn,X). 


Uniform convergence in probability of X^ to under condition dTT|) : 

We can write 


||i?„|| = ||A(r„,^)||T„||T„-Mll 

= ||A(Ai + S„/r„,/r)||.||S„|| (20) 

<5(||S„|l/r„)-||S„||. 

Fix M such that P6i(||S|| > M) < e/2 for all 0; this is possible by tightness of S as imposed in 
assumption [TJ By uniform convergence in distribution of S„ to to the continuously distributed 
S, this implies U6i„(||S„|| > M) < e for any sequence On C Q and n large enough. We get 

PeAWRnW >e)< PsAWSnW > M) + P,„(||S„|| < M, 5(M/r„) > e/M) 

= PeA\\Sn\\>M)<e. 

for any sequence On C Q and n large enough, using condition m- But this implies > 

e) ^ 0. 

Existence of a diverging sequence under condition (1121) : 

Let On be such that n{0n) = rUn, where (TO„,e(j) is a sequence such that condition (fT^ holds. 
By equation (1^ . 

PeA\\Pn\\ > e//s) > Pe^iSn G A, A(m„ + m„) > e'n) 

= Pe„iSn G A) >p/2 

for n large enough, under the conditions imposed, using again uniform convergence in distribu¬ 
tion of Sn to S. 


30 


Note that = Xn — Xn and thus ||i?n|| < ||-^n|| + ||-^n||, which implies 

P(||i?„|| > e'Js) < PiWXr^W > e'J{2s)) + P{\\X4 > e'J{2s)). 

Suppose that X„ — X (and recall X) for the given sequence 9n- Since X is tight and 

—>■ 00 , this implies P(||Xji|| > e'^/{2s)) —>■ 0, similarly for and thus 

P(p„|| > e'Js) ^ 0. 

Contradiction. This implies that we cannot have Xn X for the given sequence 0„. □ 


Proof of theorem [3j 

If Ai(0) C K* is compact, then so is ^(0)*^. Since dt(j) is assumed to be continuous on 
it follows immediately that dt4> is bound ed on this set, and we also get ||P(/i)|| < E for all 


fj, € /i(0)'^. Theorem 4.19 in 
on /r(0)C 


R.iidinl (jl99lh furthermore implies that dt4> is uniformly continuous 


Consider now first the case dim(a:) = 1. Suppose ||ti — m\\ < e and ti,m € /r(0). By 
continuous differentiability and the mean value theorem, we can write 

Yih) - ip{m) = dt4>{t2) ■ {ti - m) 


where 


and a G [0,1]. We get 


t 2 = ati + (1 — a)m G /i(0)*^ 




1 


||P(m) • iYiti) - (j){m) - dmYim) ■ (p 


< n—^ 7 -|| Il(9t(/>(t2) - dmYirn)) ■ (ti - m)\\ 

< E ■ \\dt(l){t2) - ■ 


to ))|1 
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Uniform continuity of dtip implies that for any <5 > 0 there is an e' such that ||t 2 — ni|| < e' 
implies that E- \dt(j)(t 2 ) — dm4’{'ni)\ < S. Since ||m —<211 < ll"i~U||) this implies that there exists 
a function 5(.) such that 

A(ti,m) < S{\\m - till), 

and (5 goes to 0 as its argument goes to 0, so that condition m is satisfied. 

Let us now consider the case dim(a:) = dx > 1- By the same arguments as for the case 
dim(a;) = 1, we get for the ith component of (f> that 

E- 

Ai{ti,m) := ||0z(<i) - - dra4>i{'m) ■ (<i - m)|| 

< E ■ \\dt(t)i{t2,i) - 9m(/)(m)|| . 


where 


t2,i = oiiti + (1 - ai)m e 


As before, uniform continuity of dt4>i implies existence of a function <5^ such that 


Ai{ti,m) < 5i{\\m - <i||), 


and Si goes to 0 as its argument goes to 0. By construction 


A(<i,m) = Aj(<i,m)2 < dx ■ maxAi(<i,m). 


Setting 

(5(.) = dx ■ max(5i(.) 

i 

then yields a function i5(.) which satisfies the required condition (llip . □ 


Proof of lemma [3} 

By definition [5J we need to show convergence in distribution (ie., convergence with respect to 
the bounded Lipschitz metric) along arbitrary sequences 
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Consider such a sequence and define Yin ■= Y~^^‘^{9n)-{Yi —fi{9n)), so that Var(Fi„) = I. 
Then the triangular array {Yi„,..., with distribution corresponding to satisfies the 
condition s of Lyapunov’s centr al limit theorem, and thus those of the Lindeberg-Feller CLT as 


stated in (jvan der Vaart . 


20001 proposition 2.27, p20). We therefore have 


Sn := z ^ N{0,I) 


in distribution, and thus with respect to the bounded Lipschitz metric, that is 




Now consider S'„ = • S'„ and Z = • Z. We have 


d%^{Sn.Z) <\\Y}/\9n)\\ ‘ ^BL {Sn,Z) 


— this follows from the definition of the bounded Lipschitz metric, again by the same argument 
as in the proof of theorem [TJ Since Z) —>• 0, and ||S^/^|| is bounded by assumption, we 

get dgj^{Sn,Z) —>■ 0, and the claim follows. □ 


Proof of proposition [1} 

This follows immediately from theorem [5] and Slutsky’s theorem, applied along any sequence of 
9n- □ 
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Derivation of equation (ITTII : 


\\t-- 


E(m) 


-A(t, m) = 


Cl nil 1 , . ni\ , , 

-- {h - mi) H- 2 (^2 - m 2 ) 


h 


m 2 m 2 


m% 


tim2 - t2mi 1 N N 

- (ti - mi) H- 2 (^2 - m 2 ) 


t2m2 


m 2 


tim2 — 77117712 + 77717712 — ^27711 1 , . Till 

- —(ti - Till) H-2 


t2m2 


m 2 


ih - 
^2 — m 2 


^27772 


Till 

Till) 

ih m 2 ) (t 

727772 

m 2 

l) 

■ 1 1 

-^(t2-77l2)[ 


-t2 7712 J 

7112 L 


/ X / X 

(ti - mi) -(^2 - m 2 ) . 


Till 


(^2 


1 

7712 


7772 


□ 


2 - 7712 ) 
- 7112 ) 
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